This script performs preprocessing and quality control (QC) filtering of single-cell RNA sequencing (scRNA-seq) data generated using conventional 10x Genomics, FLEX, and Parse Biosciences Mini and Midi kits. The R package CRMetrics is utilized for QC in this script. While the published version of CRMetrics (https://github.com/khodosevichlab/CRMetrics) is only compatible with conventional 10x Genomics data, a local adaptation has been implemented to support 10x FLEX and Parse Biosciences data. Additionally, this version incorporates compatibility with the latest CellBender release (version 0.3.0). For large datasets (~ 100,000 nuclei) memory limitations of R (e.g. with Conos and CRMetrics) necessitate transitioning to Python for data analysis. More information and application of the CRMetrics package can be found here: http://kkh.bric.ku.dk/rasmus/CRMetrics/walkthrough.html

QC filtering involves iterative adjustments to determine the optimal thresholds for filtering depth (unique molecular identifiers [UMIs] per cell), mitochondrial (mito) fraction (percentage of mitochondrial gene expression per cell), and doublets (two or more cells clustered together). Final clustering is not conducted in this script, it is addressed in a subsequent script, “Clustering_and_annotation”.

If ambient RNA filtering is applied, it is performed first, followed by filtering based on depth, mitochondrial fraction, and doublets. At the end of the pipeline, a new CRMetrics object is constructed using the filtered count matrices (=cms) to plot the violin plots for UMI and gene counts post-filtering.

Due to its computational intensity, ambient RNA filtering is not part of the standard analysis pipeline but is performed upon specific user request.

Setup

library(devtools)
# loading locally adapted CRMetrics version
#load_all("/people/nrq364/CRMetrics_LW")
library(CRMetrics)
library(magrittr)
library(dplyr)
library(conos)
library(pagoda2)
library(qs)

1. fastQC and multiQC

Provide links to multiQC reports of raw sequencing data and cellranger metrics. Does not yet work on cellranger multi (FLEX) and Parse output.

2. Create CRMetrics object

In case of Parse data use data.path to “combined” folder, for 10x FLEX data.path to “per_sample_outs” folder. Metadata should at least contain “sample” and “group/condition” but can have more columns.

crm <- CRMetrics$new(data.path = "/data/counts/[project_name]/", 
                     metadata = "path_to_metadata.csv", 
                     n.cores = 60)
crm$metadata
qsave(crm, "crm.qs", nthreads = 10)

3. Basic metrics

Show available metrics to plot.

crm$selectMetrics()

Plot selected metrics per sample

e.g. cells per sample, median genes, median UMIs, mean reads

metrics.to.plot <- crm$selectMetrics(ids = c(1:3,5,6))
crm$plotSummaryMetrics(plot.geom = "bar", comp.group = "sample", metrics = metrics.to.plot)

Plot selected metrics per condition

We can do the same, but set the comparison group to condition. This will add statistics to the plots. Additionaly, we can add a second comparison group for coloring.

crm$plotSummaryMetrics(comp.group = "condition",
                       metrics = metrics.to.plot, 
                       plot.geom = "point", 
                       stat.test = "non-parametric") #second.comp.group= "sex"

Add and pot detailed metrices

crm$addCms(add.metadata=F) #for Parse data: parse = T, for FLEX: flex = T
crm$addDetailedMetrics(n.cores = 30)

get total number of sequenced nuclei

sum(sapply(crm$cms, function(d) dim(d)[2]))
[1] 61975

plot UMIs and genes per sample before filtering, horizontal lines indicate the median values for all samples

crm$plotDetailedMetrics(comp.group = "condition",
                        metrics = c("UMI_count",  "gene_count"), 
                        plot.geom = "violin")

Generate a data frame with the metrics.

metrics <- crm$summary.metrics
metrics %<>% as.data.frame()
metrics

Get median genes per cell across all samples before filtering (in case of Parse: “hg38_median_genes_per_cell”).

median(metrics[metrics$metric=="median genes per cell",]$value)
[1] 3478

Get median UMIs per cell across all samples before filtering (in case of Parse: “hg38_median_tscp_per_cell”).

median(metrics[metrics$metric=="median umi counts per cell",]$value)
[1] 7945

If ambient RNA removal is part of the pipeline, it is performed here. Relevant code is located further down in the script under 8.

4. Embed cells using conos

To plot the cells in an embedding, we need to preprocess the raw count matrices with pagoda2 (seurat is also possible).

crm$doPreprocessing() #min.transcripts.per.cell = 100

Create embedding using conos. This functions executes the functions Conos$new(cms, n.cores = n.cores), buildGraph(), findCommunities(), embedGraph(method = “UMAP”), with default settings, which can be adjusted.

crm$createEmbedding()

Plot the embedding. This is before filtering, so we are not yet interested in the clusters, those are refined in the next script.

crm$plotEmbedding()

5. Filtering out low quality cells

Cell depth

Plot cell depth in the embedding or as histograms per sample.

crm$plotEmbedding(depth = TRUE, 
             depth.cutoff = 1000)

crm$plotDepth()

If the depth distribution varies between samples, we can use sample-specific cutoffs.

depth_cutoff_vec <- c(1e3, 1e3, 500, 1e3, 1e3, 500, 1e3, 1e3, 1e3, 1e3, 1e3, 500) %>% 
  setNames(crm$detailed.metrics$sample %>% 
             unique() %>% 
             sort())

depth_cutoff_vec
control_1 control_2 control_4 control_5 control_6 control_7 treated_1 treated_2 treated_3 treated_4 
     1000      1000       500      1000      1000       500      1000      1000      1000      1000 
treated_6 treated_7 
     1000       500 

Plot sample specific cutoff in histogram and embedding.

crm$plotDepth(cutoff = depth_cutoff_vec)

crm$plotEmbedding(depth = TRUE, 
             depth.cutoff = depth_cutoff_vec)

Mitochondrial gene fraction

Investigate the mitochondrial gene fraction in the cells. The example dataset has a very low expression of mitochondrial genes. Usual cutoff is 5% or higher.

crm$plotEmbedding(mito.frac = TRUE, 
             mito.cutoff = 0.01, 
             species = "mouse")

We can plot the distribution of the mitochondrial fraction per sample and also include sample-wise cutoffs (not shown here as the example dataset has only a very low expression of mitochondrial genes).

Doublet detection

Doublet detection within CRMetrics is possible with the python packages scrublet or DoubletDetection Scrublet is the default method, which is fast. DoubletDetection is significantly slower, but performs better according to this review (https://www.sciencedirect.com/science/article/pii/S2405471220304592). Whcih one to choose depends on the dataset and technology used.

Here we use scrublet as default method. A virtual environment is used to install scrublet.

library(reticulate)
conda_create("scrublet", 
             conda = "/opt/software/miniconda/4.12.0/condabin/conda", 
             python_version = 3.8)

Run scrublet from within R or generate a Python script to run it from the terminal. To do this, set export = TRUE and run python scrublet.py inside the virtual environment to generate data.

crm$detectDoublets(env = "scrublet",
                   conda.path = "/opt/software/miniconda/4.12.0/condabin/conda",
                   method = "scrublet")

Then, load data with:

crm$addDoublets()

Plot the estimated doublets in the embedding.

crm$plotEmbedding(doublet.method = "scrublet")

Plot filtered cells

Plot all cells to be filtered on the embedding.

crm$plotFilteredCells(type = "embedding", 
                      depth = TRUE, 
                      depth.cutoff = 1000, 
                      doublet.method = "scrublet", 
                      mito.frac = FALSE, 
                      species = "mouse")
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Plot the cells to be filtered per sample in a bar plot where combination means a cell that has at least two filter labels, e.g. doublet and depth.

crm$plotFilteredCells(type = "bar", 
                      depth = TRUE, 
                      depth.cutoff = 1000, 
                      doublet.method = "scrublet", 
                      mito.frac = FALSE, 
                      species = "mouse")

6. Filter and save count matrices

Finally, filter the count matrices to create a cleaned list to be used in downstream applications. If cellbender should be used then cms needs to be added after cellbender and then all subsequent filtering is done.

crm$filterCms(depth.cutoff = 1000, 
              mito.cutoff = NULL, 
              doublets = "scrublet",
              samples.to.exclude = NULL,
              species = "mouse",
              verbose = TRUE)

The filtered list of count matrices is stored in $cms.filtered which can be saved on disk.

qsave(crm$cms.filtered, "cms_filtered[_cellbender].qs", 
      nthreads = 10)

7. Plot detailed metrics after filtering

cms <- qread("cms_filtered[_cellbender].qs")

Create a new crm object with cms after filtering.

crm_filtered <- CRMetrics$new(cms = crm$cms.filtered, n.cores = 60, metadata="path_to_metadata.csv")
crm_filtered$addSummaryFromCms()
crm_filtered$addDetailedMetrics(n.cores = 30)

Total number of nuclei retained after QC:

sum(sapply(crm_filtered$cms, function(d) dim(d)[2]))

Plot UMIs and genes per sample after QC filtering, horizontal lines indicate the median values for all samples.

crm_filtered$plotDetailedMetrics(comp.group = "condition",
                        metrics = c("UMI_count",  "gene_count"), 
                        plot.geom = "violin")

Generate a data frame with the metrics.

metrics <- crm_filtered$summary.metrics
metrics %<>% as.data.frame()
metrics

Median genes per cell across all samples after QC filtering (in case of Parse: “hg38_median_genes_per_cell”).

median(metrics[metrics$metric=="median genes per cell",]$value)

Median UMIs per cell across all samples after QC filtering (in cas of Parse: “hg38_median_tscp_per_cell”).

median(metrics[metrics$metric=="median umi counts per cell",]$value)

8. Optional: remove ambient RNA

We use usually cellbender, SoupX is also included in CRMetrics. Perform this step before filtering, it changes the number of called cells and their gene counts.

In the old cellbender version (before 0.3.0) we needed to specify expected number of cells and total droplets included. As hinted in the manual, the number of total droplets included could be expected number of cells multiplied by 3, this can still be plotted here when setting show.total.droplets=T. Additionally, in case of Parse, the following function creates the input files for cellbender from the Parse output.

crm$prepareCellbender(show.expected.cells = T, show.total.droplets = T)
2025-04-30 10:32:28.050553 Started run using 12 cores
2025-04-30 10:32:28.057862 Using stored HDF5 Cell Ranger/Parse outputs. To overwrite, set $cms.raw <- NULL
2025-04-30 10:32:28.058687 Using stored UMI counts calculations. To overwrite, set $cellbender$umi.counts <- NULL
2025-04-30 10:32:28.059399 Plotting
2025-04-30 10:32:28.885673 Done!

The following function prepares and saves the cellbender script. To only select specific samples, use the samples parameter. total.droplets and expected.cells is a logical because in cellbender from v0.3.0 total.droplets and expected.cells do not need to be specified anymore.
○ FALSE is default
○ TRUE would take expected.cells from metadata and total droplets would be 3x expected cells
○ alternatively give own numbers (named vector)

crm$saveCellbenderScript(file = "cellbender_script.sh", 
                         fpr = 0.01, 
                         epochs = 150, 
                         use.gpu = TRUE, total.droplets = F, expected.cells = F) # could be total.droplets = droplets (named vector)

We can run this script in the terminal (screen session). Here, we activate the environment: conda activate cellbender_v0.3.0 and we run the bash script: sh cellbender_script.sh.

Plot cellbender results

Plot the changes in cell numbers following CellBender estimations.

crm$plotCbCells()

Plot the CellBender training results.

crm$plotCbTraining()

Plot the cell probabilities.

crm$plotCbCellProbs()

Plot the identified ambient genes per sample.

crm$plotCbAmbExp(cutoff = 0.005)

Lastly, plot the proportion of samples expressing ambient genes. MALAT1 is identified as an ambient gene in almost all samples which is expected.

crm$plotCbAmbGenes(cutoff = 0.005)

Add cellbender filtered cms

Then, we can add the cellbender filtered cms to our object. Additionally, it is recommended to generate new summary metrics from the adjusted cms which can then be plotted.

crm$cms <- NULL
crm$addCms(cellbender = T)
crm$addSummaryFromCms()
crm$detailed.metrics <- NULL
crm$addDetailedMetrics()
qsave(crm, "crm_cellbender.qs")
---
title: "1) Preprocessing and QC filtering"
date: "`r Sys.Date()`"
output:
  html_notebook:
    toc: true
    toc_float: true
    code_folding: none
---

This script performs preprocessing and quality control (QC) filtering of single-cell RNA sequencing (scRNA-seq) data generated using conventional 10x Genomics, FLEX, and Parse Biosciences Mini and Midi kits.
The R package CRMetrics is utilized for QC in this script. While the published version of CRMetrics (https://github.com/khodosevichlab/CRMetrics) is only compatible with conventional 10x Genomics data, a local adaptation has been implemented to support 10x FLEX and Parse Biosciences data. Additionally, this version incorporates compatibility with the latest CellBender release (version 0.3.0). For large datasets (~ 100,000 nuclei) memory limitations of R (e.g. with Conos and CRMetrics) necessitate transitioning to Python for data analysis. More information and application of the CRMetrics package can be found here: http://kkh.bric.ku.dk/rasmus/CRMetrics/walkthrough.html

QC filtering involves iterative adjustments to determine the optimal thresholds for filtering depth (unique molecular identifiers [UMIs] per cell), mitochondrial (mito) fraction (percentage of mitochondrial gene expression per cell), and doublets (two or more cells clustered together). Final clustering is not conducted in this script, it is addressed in a subsequent script, "Clustering_and_annotation".

If ambient RNA filtering is applied, it is performed first, followed by filtering based on depth, mitochondrial fraction, and doublets. At the end of the pipeline, a new CRMetrics object is constructed using the filtered count matrices (=cms) to plot the violin plots for UMI and gene counts post-filtering.

Due to its computational intensity, ambient RNA filtering is not part of the standard analysis pipeline but is performed upon specific user request.


# Setup
```{r, message=F}
library(devtools)
# loading locally adapted CRMetrics version
#load_all("/people/nrq364/CRMetrics_LW")
library(CRMetrics)
library(magrittr)
library(dplyr)
library(conos)
library(pagoda2)
library(qs)
```
# 1. fastQC and multiQC
Provide links to multiQC reports of raw sequencing data and cellranger metrics. Does not yet work on cellranger multi (FLEX) and Parse output.

# 2. Create CRMetrics object
In case of Parse data use data.path to "combined" folder, for 10x FLEX data.path to "per_sample_outs" folder. 
Metadata should at least contain "sample" and "group/condition" but can have more columns.
```{r}
crm <- CRMetrics$new(data.path = "/data/counts/[project_name]/", 
                     metadata = "path_to_metadata.csv", 
                     n.cores = 60)
```

```{r}
crm$metadata
```

```{r}
qsave(crm, "crm.qs", nthreads = 10)
```

# 3. Basic metrics

Show available metrics to plot.
```{r}
crm$selectMetrics()
```

## Plot selected metrics per sample
e.g. cells per sample, median genes, median UMIs, mean reads
```{r, fig.height=6, warning=F}
metrics.to.plot <- crm$selectMetrics(ids = c(1:3,5,6))
crm$plotSummaryMetrics(plot.geom = "bar", comp.group = "sample", metrics = metrics.to.plot)
```

## Plot selected metrics per condition
We can do the same, but set the comparison group to condition. This will add statistics to the plots. Additionaly, we can add a second comparison group for coloring.
```{r, fig.height=6, warning=F}
crm$plotSummaryMetrics(comp.group = "condition",
                       metrics = metrics.to.plot, 
                       plot.geom = "point", 
                       stat.test = "non-parametric") #second.comp.group= "sex"
```

## Add and pot detailed metrices
```{r}
crm$addCms(add.metadata=F) #for Parse data: parse = T, for FLEX: flex = T
crm$addDetailedMetrics(n.cores = 30)
```

get total number of sequenced nuclei
```{r}
sum(sapply(crm$cms, function(d) dim(d)[2]))
```

plot UMIs and genes per sample before filtering, horizontal lines indicate the median values for all samples
```{r, warning=F}
crm$plotDetailedMetrics(comp.group = "condition",
                        metrics = c("UMI_count",  "gene_count"), 
                        plot.geom = "violin")
```
Generate a data frame with the metrics.
```{r}
metrics <- crm$summary.metrics
metrics %<>% as.data.frame()
metrics
```
Get median genes per cell across all samples before filtering
(in case of Parse: "hg38_median_genes_per_cell").
```{r}
median(metrics[metrics$metric=="median genes per cell",]$value)
```

Get median UMIs per cell across all samples before filtering
(in case of Parse: "hg38_median_tscp_per_cell").
```{r}
median(metrics[metrics$metric=="median umi counts per cell",]$value)
```

If ambient RNA removal is part of the pipeline, it is performed here. Relevant code is located further down in the script under 8.

# 4. Embed cells using conos 
To plot the cells in an embedding, we need to preprocess the raw count matrices with pagoda2 (seurat is also possible).
```{r}
crm$doPreprocessing() #min.transcripts.per.cell = 100
```

Create embedding using conos. This functions executes the functions Conos$new(cms, n.cores = n.cores), buildGraph(), findCommunities(), embedGraph(method = "UMAP"), with default settings, which can be adjusted. 
```{r}
crm$createEmbedding()
```

Plot the embedding. This is before filtering, so we are not yet interested in the clusters, those are refined in the next script.
```{r}
crm$plotEmbedding()
```

# 5. Filtering out low quality cells
## Cell depth
Plot cell depth in the embedding or as histograms per sample.
```{r}
crm$plotEmbedding(depth = TRUE, 
             depth.cutoff = 1000)
```
```{r, fig.height=6, warning=F}
crm$plotDepth()
```

If the depth distribution varies between samples, we can use sample-specific cutoffs.
```{r}
depth_cutoff_vec <- c(1e3, 1e3, 500, 1e3, 1e3, 500, 1e3, 1e3, 1e3, 1e3, 1e3, 500) %>% 
  setNames(crm$detailed.metrics$sample %>% 
             unique() %>% 
             sort())

depth_cutoff_vec
```

Plot sample specific cutoff in histogram and embedding.
```{r, warning=F, fig.height=6}
crm$plotDepth(cutoff = depth_cutoff_vec)
```

```{r}
crm$plotEmbedding(depth = TRUE, 
             depth.cutoff = depth_cutoff_vec)
```

## Mitochondrial gene fraction
Investigate the mitochondrial gene fraction in the cells. The example dataset has a very low expression of mitochondrial genes. Usual cutoff is 5% or higher.
```{r}
crm$plotEmbedding(mito.frac = TRUE, 
             mito.cutoff = 0.01, 
             species = "mouse")
```

We can plot the distribution of the mitochondrial fraction per sample and also include sample-wise cutoffs (not shown here as the example dataset has only a very low expression of mitochondrial genes).
```{r, echo=TRUE, results='hide', fig.height=6, warning=F}
crm$plotMitoFraction(cutoff = 0.01)
```

## Doublet detection
Doublet detection within CRMetrics is possible with the python packages scrublet or DoubletDetection
Scrublet is the default method, which is fast. DoubletDetection is significantly slower, but performs better according to this review (https://www.sciencedirect.com/science/article/pii/S2405471220304592). Whcih one to choose depends on the dataset and technology used.

Here we use scrublet as default method.
A virtual environment is used to install scrublet.
```{r}
library(reticulate)
conda_create("scrublet", 
             conda = "/opt/software/miniconda/4.12.0/condabin/conda", 
             python_version = 3.8)
```


Run scrublet from within R or generate a Python script to run it from the terminal. To do this, set export = TRUE and run python scrublet.py inside the virtual environment to generate data.
```{r}
crm$detectDoublets(env = "scrublet",
                   conda.path = "/opt/software/miniconda/4.12.0/condabin/conda",
                   method = "scrublet")
```

Then, load data with:
```{r}
crm$addDoublets()
```

Plot the estimated doublets in the embedding.
```{r}
crm$plotEmbedding(doublet.method = "scrublet")
```


## Plot filtered cells
Plot all cells to be filtered on the embedding.
```{r}
crm$plotFilteredCells(type = "embedding", 
                      depth = TRUE, 
                      depth.cutoff = 1000, 
                      doublet.method = "scrublet", 
                      mito.frac = FALSE, 
                      species = "mouse")
```

Plot the cells to be filtered per sample in a bar plot where combination means a cell that has at least two filter labels, e.g. doublet and depth.
```{r, warning=F}
crm$plotFilteredCells(type = "bar", 
                      depth = TRUE, 
                      depth.cutoff = 1000, 
                      doublet.method = "scrublet", 
                      mito.frac = FALSE, 
                      species = "mouse")
```


# 6. Filter and save count matrices
Finally, filter the count matrices to create a cleaned list to be used in downstream applications.
If cellbender should be used then cms needs to be added after cellbender and then all subsequent filtering is done.
```{r}
crm$filterCms(depth.cutoff = 1000, 
              mito.cutoff = NULL, 
              doublets = "scrublet",
              samples.to.exclude = NULL,
              species = "mouse",
              verbose = TRUE)
```

The filtered list of count matrices is stored in $cms.filtered which can be saved on disk.
```{r}
qsave(crm$cms.filtered, "cms_filtered[_cellbender].qs", 
      nthreads = 10)
```

# 7. Plot detailed metrics after filtering
```{r}
cms <- qread("cms_filtered[_cellbender].qs")
```

Create a new crm object with cms after filtering.
```{r}
crm_filtered <- CRMetrics$new(cms = crm$cms.filtered, n.cores = 60, metadata="path_to_metadata.csv")
crm_filtered$addSummaryFromCms()
crm_filtered$addDetailedMetrics(n.cores = 30)
```

Total number of nuclei retained after QC:
```{r}
sum(sapply(crm_filtered$cms, function(d) dim(d)[2]))
```

Plot UMIs and genes per sample after QC filtering, horizontal lines indicate the median values for all samples.
```{r}
crm_filtered$plotDetailedMetrics(comp.group = "condition",
                        metrics = c("UMI_count",  "gene_count"), 
                        plot.geom = "violin")
```
Generate a data frame with the metrics.
```{r}
metrics <- crm_filtered$summary.metrics
metrics %<>% as.data.frame()
metrics
```
Median genes per cell across all samples after QC filtering (in case of Parse: "hg38_median_genes_per_cell").
```{r}
median(metrics[metrics$metric=="median genes per cell",]$value)
```

Median UMIs per cell across all samples after QC filtering (in cas of Parse: "hg38_median_tscp_per_cell").
```{r}
median(metrics[metrics$metric=="median umi counts per cell",]$value)
```


# 8. Optional: remove ambient RNA
```{r, include=F}
crm <- qread("crm_cellbender_without_ctrl3_and_treated5.qs", nthreads=10)
```

We use usually cellbender, SoupX is also included in CRMetrics.
Perform this step before filtering, it changes the number of called cells and their gene counts.

In the old cellbender version (before 0.3.0) we needed to specify expected number of cells and total droplets included. As hinted in the manual, the number of total droplets included could be expected number of cells multiplied by 3, this can still be plotted here when setting show.total.droplets=T.
Additionally, in case of Parse, the following function creates the input files for cellbender from the Parse output. 
```{r, fig.height=6, warning=F}
crm$prepareCellbender(show.expected.cells = T, show.total.droplets = T)
```


The following function prepares and saves the cellbender script. To only select specific samples, use the samples parameter. 
total.droplets and expected.cells is a logical because in cellbender from v0.3.0 total.droplets and expected.cells do not need to be specified anymore. <br>
		○ FALSE is default <br>
		○ TRUE would take expected.cells from metadata and total droplets would be 3x expected cells <br>
		○ alternatively give own numbers (named vector) 
```{r}
crm$saveCellbenderScript(file = "cellbender_script.sh", 
                         fpr = 0.01, 
                         epochs = 150, 
                         use.gpu = TRUE, total.droplets = F, expected.cells = F) # could be total.droplets = droplets (named vector)
```

We can run this script in the terminal (screen session). Here, we activate the environment: conda activate cellbender_v0.3.0 and we run the bash script: sh cellbender_script.sh.

## Plot cellbender results
Plot the changes in cell numbers following CellBender estimations.
```{r, warning=F}
crm$plotCbCells()
```
```{r, eval=F, include=F}
devtools::load_all("/people/nrq364/CRMetrics_LW")
crm %<>% CRMetrics$new()
```

Plot the CellBender training results.
```{r, fig.height=6, warning=F}
crm$plotCbTraining()
```

Plot the cell probabilities.
```{r, fig.height=6, warning=F}
crm$plotCbCellProbs()
```

Plot the identified ambient genes per sample.
```{r, fig.height=6}
crm$plotCbAmbExp(cutoff = 0.005)
```

Lastly, plot the proportion of samples expressing ambient genes. MALAT1 is identified as an ambient gene in almost all samples which is expected.
```{r, warning=F}
crm$plotCbAmbGenes(cutoff = 0.005)
```

## Add cellbender filtered cms 
Then, we can add the cellbender filtered cms to our object. Additionally, it is recommended to generate new summary metrics from the adjusted cms which can then be plotted.
```{r}
crm$cms <- NULL
crm$addCms(cellbender = T)
crm$addSummaryFromCms()
crm$detailed.metrics <- NULL
crm$addDetailedMetrics()
qsave(crm, "crm_cellbender.qs")
```
